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The Digital Library Federation 



On May 1, 1995, 16 institutions created the Digital Li- 
brary Federation (additional partners have since 
joined the original 16). The DLF partners have commit- 
ted themselves to "bring together — from across the 
nation and beyond — digitized materials that will be 
made accessible to students, scholars, and citizens 
everywhere." If they are to succeed in reaching their 
goals, all DLF participants realize that they must act 
quickly to build the infrastructure and the institutional 
capacity to sustain digital libraries. In support of DLF 
participants' efforts to these ends, DLF launched this 
publication series in 1999 to highlight and disseminate 
critical work. 
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Foreword 

Metadata is what makes it possible to locate, provide access to, navi- 
gate, and manage digital information in diverse forms. Ongoing work 
on metadata definition has been critical to the development of digital 
libraries. The extension and refinement of the Dublin Core, efforts to 
establish a set of technical metadata elements for images, and other 
initiatives are expanding the application and usefulness of metadata. 
Echoing earlier published works, this paper emphasizes the impor- 
tance of metadata in those developments. 

The work of the Making of America II Testbed Project reported in 
this paper represents a singular effort in digital library development 
to find ways to provide access to and navigate a variety of materials. 

In this endeavor, a digital library service model has been defined that 
encapsulates the interaction of digital objects (including their metada- 
ta), tools, and services based on principles of object-oriented design. In 
developing the digital library service model, project participants did 
extensive work to identify and define the structural and administra- 
tive (often referred to as technical) metadata elements that are crucial 
in the development of the digital library services and tools. 

The Digital Library Federation's support of this work was driven 
by two of its program priorities: to stimulate the development of a 
core digital library infrastructure and to organize, provide access to, 
and preserve knowledge. This publication — DLF's third — furthers the 
interests of the Federation and its members by presenting one possible 
model of digital library development for review and discussion within 
the DLF community and the digital library community at large. 
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Reader's Guide 

Drawing on the example of the Making of America II Testbed Project, 
this report examines an object-oriented approach to digital library 
construction, the collection of structural and administrative metadata, 
and the development of tools to assist scholars. It is divided into four 
main parts. Readers should approach the report part by part, focusing 
on those areas of particular interest. 

• The Executive Summary provides an overview of the MoA II Test- 
bed Project and describes the content and objectives of this report. 

• Part I, Project Background , describes the history of the project and 
outlines the activities to be undertaken during each of the three 
phases. 

• Part II, The MoA II Digital Library Service Model , reviews the techni- 
cal details of the model for digital library objects. It briefly de- 
scribes the three layers of the project: services, tools, and digital 
library objects. 

• Part III, Implementing the Service Model is the most detailed section 
of the report. It discusses the use of tools in the digital library, 
presents an overview of structural and administrative metadata, 
and provides recommendations for the collection of metadata. 

Recommendations for imaging are not covered in this report. This 
topic will be covered extensively in Guides to Quality in Visual Resource 
Imaging , which the Council on Library and Information Resources and 
The Research Libraries Group will publish on the Web in early 2000. 
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EXECUTIVE 

SUMMARY 



T he Making of America Testbed Project, coordinated by the 
Digital Library Federation (DLF), is a multiphase endeavor. 
Its purpose is to investigate important issues in the cre- 
ation of an integrated but distributed digital library of ar- 
chival materials (that is, digitized surrogates of primary source mate- 
rials found in archives and special collections). Drafted during the 
MoA II planning phase, this report identifies a starting point for the 
testbed that is being created in the production phase of this project, 
which is funded by the National Endowment for Humanities. 

The library community has a distinguished history of develop- 
ing standards to enhance the discovery and sharing of print materi- 
als: they include, for example, MARC, Z39.50, and interlibrary loan 
protocols. This leadership continues today, as libraries create new 
best practices and standards that address digital collections and con- 
tent issues. The primary goal of this report is to open a dialogue 
about digital library standards, specifically, to discuss any new best 
practices and standards that will be required to enable the digital li- 
brary to meet traditional collection, preservation, and access objec- 
tives. 

This report asks the question, "How can we create integrated 
digital library services that operate across multiple, distributed re- 
positories?" Existing standards and best practices clearly play an im- 
portant role in answering this question. However, this report and the 
MoA II Testbed Project raise a new area of discussion that goes be- 
yond the discovery of a digital object and address how it is handled. 
The report and the testbed focus on the need to develop standards 
for creating and encoding digital representations of archival objects 
(for example, a digitized photograph or a digital representation of a 
book or diary). If tools are to be developed that work with digitized 
archival objects across distributed repositories, these objects will re- 
quire some form of standardization. 

This report begins the discussion of digital object definitions by 
developing and examining metadata standards for digital represen- 
tations of a variety of archival objects, including text, digitized page 
images, photographs, and other forms. For the purposes of this re- 
port, there are three types of metadata: descriptive, structural, and 
administrative. Descriptive metadata are used to discover the object. 
A researcher may use descriptive metadata to limit a search by title 
and author in an OPAC or other database. Structural metadata define 
the object's internal organization and are needed for display and 
navigation of that object. For instance, structural metadata may con- 
tain information about the number of pages an object contains and 
what order they should be viewed in. Administrative metadata con- 
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tain the management information needed to keep the object over 
time and to identify artifacts that might have been introduced during 
its production and management. For example, administrative meta- 
data indicate when the object was digitized, at what resolution, and 
who can access it. 

The project testbed proposes to use existing descriptive metadata 
standards, such as MARC records and the Dublin Core, as well as 
standards that incorporate both descriptive and structural metadata, 
such as the Encoded Archival Description (EAD), to help the user 
locate a particular digital object. This report proposes defining new 
standards for the structural and administrative metadata needed to 
view and manage digital objects. 

At a higher level, the report proposes a Digital Library Service 
Model in which services are based on tools that work with the digital 
objects from distributed repositories. This approach borrows from 
the popular object-oriented design model. It defines a digital object 
as encapsulating content, metadata, and methods. Methods are pro- 
gram code segments that allow the object to perform services for 
tools (for example "Get the next page of this digital diary"). Unlike 
other models, the Digital Library Service Model includes methods as 
part of the object. 

The report also identifies several archival digital object classes 
that are being examined as part of the MoA II project, including pho- 
tographs, photograph albums, diaries, journals, letterpress books, 
ledgers, and correspondence. One of the objectives for the testbed is 
to develop the tools that display and navigate these MoA II objects, 
some of which have complex internal organization. Therefore, anoth- 
er goal of this report is to identify the structural metadata elements 
that are needed to support display and navigation and ensure that 
they are included as part of the digital objects. Finally, this report be- 
gins to examine the methods (program code) that could be included 
with each class of object. 

After the library and archival communities have reviewed this 
report, MoA II participants will incorporate reader feedback into the 
development of digital object definitions for the classes of materials 
to be examined in the MoA II testbed. These definitions will specify 
how to encode the content, metadata, and methods as part of the ob- 
ject. An important goal of the project is to use the testbed to investi- 
gate the advantages and limitations of these definitions and stimu- 
late discussion of standards for digital library objects and best 
practices for digitizing archival materials. This discussion must in- 
clude the project participants, DLF members, and representatives of 
the wider community. In addition, the project will contribute to the 
DLF Architecture Committee's ongoing discussion of distributed sys- 
tem architectures for digital libraries. The MoA II testbed will give 
the library and archival communities a tool they can use to test, eval- 
uate, and refine digital library object definitions and digitization 
practices. It is expected that these discussions will move the archival 
and library communities closer to a consensus on standards and best 
practices in these areas. 
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PART I: 

PROJECT 

BACKGROUND 



In 1998, the DLF, working with staff of the University of California 
(UC) at Berkeley, developed a grant proposal that requested support 
to create a testbed for its Making of America II project. The objective 
of the testbed was to move the DLF members and the wider library 
and archival communities closer to the realization of a national digi- 
tal library by addressing several issues that are critical to this goal. 

UC Berkeley submitted the proposal to the National Endowment 
for the Humanities (NEH), which awarded funding. The proposed 
project team included individuals associated with UC-Berkeley and 
four other DLF member institutions: Cornell University, the New 
York Public Library, Pennsylvania State University, and Stanford 
University. 

As described in the proposal, the MoA II testbed is designed to 
provide a means for the DLF to investigate, refine, and recommend 
metadata elements and encodings used to discover, display, and nav- 
igate digital archival objects. The DLF expects that the MoA II test- 
bed will generate a working system for investigating metadata prob- 
lems and for discussing, testing, and refining different solutions. The 
project will give DLF members information that can be used to create 
the necessary standards or recommendations for best practices for 
each research area. The project will also be of value to the library and 
archival communities as a whole because it will advance discussion 
of the nature of the digital library and move libraries toward a con- 
sensus. 

The project has three phases: planning, research and production, 
and dissemination. The planning phase was funded by the DLF. Dur- 
ing the research and production phase, which is funded by the NEH 
and is currently under way, theories developed in the planning 
phase are being tested. In the dissemination phase, the project will 
share its tested ideas and practices with the broader community. 



Planning Phase (October 1997-May 1998) 

Participants in the planning phase decided that the MoA II Testbed 
Project must engage scholars, archivists, and librarians interested in 
access to the digital materials represented in the project, as well as 
metadata and technical experts. The following four activities were 
recommended: 

1. UC Berkeley would work with representatives from Cornell, the 
New York Public Library, Penn State, and Stanford, and with 
consultants and selected archivists, to review the collections pro- 
posed for conversion and identify the classes of digital archival 
objects to be represented in the testbed. The classes could include 
formats such as correspondence, photographs, diaries, and led- 
gers. (The MoA II Steering Committee recommended before the 
start of the project that books and serial articles be considered 
outside the scope of this project.) 

2. UC Berkeley, working with the same group, would draft a paper 
that identified the behaviors of each class of digital objects and 
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the structural and administrative metadata to support those be- 
haviors. In addition, the paper would suggest initial best practic- 
es for digitizing the classes of archival objects to be included in 
the project. Finally, it would include a compilation of existing 
work in these areas as well as any original contributions the 
group could provide. 

3. The participants in the MoAII Testbed Project and the DLF Ar- 
chitecture Committee would review the draft paper. It would 
then be revised and distributed to the wider community for re- 
view. 

4. Technical experts at UC Berkeley would analyze the paper and 
design a means of encoding the behaviors, metadata, and objects 
for implementation during the research and production phase of 
the project. 



Research and Production Phase (May 
1998-March 2000) 

The MoA II testbed would be used to investigate, refine, and en- 
hance the working definitions of administrative and structural meta- 
data, and the important behaviors of archival objects. The testbed 
project has the following goals, defined during the planning phase: 

• to create tools that help the library community understand how 
digital archival objects are discovered, displaye, and navigated; 

• to understand how these tools use metadata and what value the 
metadata provide and at what cost; and 

• to give the DLF a set of metadata practices that can be reviewed 
and recommended to the wider community. 

Dissemination Phase (Summer 2000) 

When the research and production phase has ended, the MoA II Test- 
bed Project will seek funding for an invitational seminar at which 
project results will be reviewed. Participants will include digital li- 
brary experts, archivists and special collections librarians, scholars, 
computer scientists, museum professionals, and others who have 
participated in developing the EAD protocols, are engaged in similar 
work, or have appropriate expertise. At the end of this phase, project 
results will be disseminated, practices established will be refined, as 
necessary, and an agenda for further community review will be for- 
mulated. 
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PART II: 

THE MOA II 
DIGITAL LIBRARY 
SERVICE MODEL 
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Overview 

The digital library service model developed for the MoA II Testbed 
Project has three layers: services, tools, and digital library objects (fig. 1). 
In this model, services are provided through tools that discover, dis- 
play, navigate, and manipulate digital objects from distributed repos- 
itories. 

This report also proposes a digital object model that fits within 
the service model. The object model defines digital objects, which are 
the foundation of the service model, as an encapsulation of content, 
metadata, and methods. 

Each of the layers in the model may be described as follows. 



Services Layer 

This layer describes the services to be provided for a specific group 
of users. Because the MoA II Testbed Project relates to scholars' use 
of archival materials, these services could include the discovery, dis- 
play, navigation, and manipulation of digital surrogates made from 
these collections. The specific service model used in this project fol- 
lows the standard archival model; that is, materials can be discov- 
ered via USMARC collection-level records in a catalog. The catalog 
records can link the user to the related finding aid that describes the 
collection in more detail, and the finding aids can link to individual 
digitized archival materials. 

The services layer contains a suite of tools to support the needs 
of a particular group of users. For example, scholars would be com- 
fortable using sophisticated electronic finding aids to locate and 
view digital archival materials such as photographs or diaries. How- 
ever, fifth-graders, with less rigorous information needs, may require 
simpler tools to discover and view these items. 
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Tools Layer 

This layer contains the tools that serve the user. The MoA II tools 
consist of the following: 

• an online catalog for the discovery and display of the USMARC 
collection-level records; 

• a standard generalized markup language (SGML)-compliant da- 
tabase that will be used to search, display, and navigate the 
EAD-compliant electronic finding aids; and 

• tools to display and navigate the MoA Il-compliant digital archi- 
val objects. (Objects are MoA Il-compliant when they can be de- 
livered using the proposed encoding standards described later in 
this paper.) 

Any tool is actually a suite of behaviors, or actions. With a digital 
diary, for example, such behaviors could include actions such as 
"Turn to the next page/' "Go to the previous page," "Jump to Chap- 
ter 3," or "Translate this page into French." 



Digital Library Objects Layer 

This layer contains the actual digital objects that populate distributed 
network repositories. Objects of the same class share encoding stan- 
dards that encapsulate (that is, include) their content, metadata, and 
methods. Separate classes of digital objects could be defined for 
books, continuous-tone photographs, diaries, and other objects. 



A Model for Digital Library Objects 

Digital library objects form the foundation of the digital library ser- 
vice model. It is now possible to create a digital object model for 
these objects that will fit within the overall service model. 



Adding Classes and Content to the MoA II Object Model 

The MoA II object model defines classes of digital archival objects 
(for example, diaries, journals, photographs, and correspondence). 
Each object in a given class has content that is a digital representa- 
tion of a particular item. The content can be digitized page images, 
ASCII text, numeric data sets, and other formats. The following are 
examples of three classes of archival objects and their content format: 

• a photograph made up of a single digitized tagged image file 
format (TIFF) image 

• a photo album made up of 30 photograph objects 

• a diary made up of 200 digitized TIFF page images and textual 
transcriptions 

The object model starts by defining classes of archival objects in 
a system under which each object has content that is an electronic 



O 

ERJC 



16 



The Making of America II Testbed Project 



7 



representation of a particular archival item of that class. 

Adding Metadata to the MoA II Object Model 

For the purposes of this discussion, metadata are considered as sepa- 
rate from content. Metadata are data that in some manner describe 
the content. The DLF systems architecture committee has identified 
three types of metadata: 

1. Descriptive metadata are used in the discovery and identifica- 
tion of an object. Examples include MARC and Dublin Core 
records. 

2. Structural metadata are used to display and navigate a particular 
object for a user. They include information on the internal orga- 
nization of an object. 1 For example, a given diary has three vol- 
umes. Volume I has two sections: dated entries and accounts. 

The dated entries section has 200 entries; entry 20 is dated Au- 
gust 4, 1890, and starts on page 50 of Volume I. 

3. Administrative metadata represent the management information 
for the object, including the date it was created, its content file 
format, and rights information. 

Metadata can now be added to the model. Any class of archival 
object encapsulates both content and metadata, where the metadata 
are used to discover, display, navigate, manipulate, and learn more 
about a particular object's management information. 

The distinction among the three types of metadata is not abso- 
lute. For example, chapters are part of the structure of a book, but 
chapter headings may be indexed to aid in the discovery of the item, 
thus filling one of the roles of descriptive metadata. In fact, the text 
of a book itself could be indexed and used for discovery. 

Adding Methods to the MoA II Object Model 

Several concepts used in this paper, including methods, originate 
from object-oriented design (OOD). 

Object-Oriented Design as Part of the Object Model 
The popularity of OOD is evident in the widespread use of related 
programming languages such as C++ and Java. Some of the reasons 
for this popularity also make OOD an attractive addition to the digi- 
tal library service model. In particular, OOD actually models users' 
behaviors, making it easier to more accurately translate their needs 



1 Structural metadata exist in various levels of complexity. The diary example 
above represents a rich structure that may be created for an important work and 
would include a transcription of the digitized handwritten pages. The structure 
of the diary could be encoded in this transcription, and the structural metadata 
could be extracted from it. At the other extreme, a diary could exist with only 
enough structural metadata to turn the pages. 
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into system applications. This advantage will be discussed in more 
detail later. 

Object-oriented design has another important advantage. In 
OOD, a digital object conceptually encapsulates both content and 
methods. Methods are program code segments that allow an object to 
perform services for tools. These methods are part of the object and 
can be used by developers to interact with the content. For example, 
a developer can ask a digital book object named Bookl for page 25 
by executing that object's get_page() method and specifying page 25. 
This method call may look something like Bookl. get _page(25). 

The most important advantage of making methods part of the 
object may be that these basic program segments do not then have to 
be reinvented by every developer.2 Instead, the developer can have 
the tool ask the object's existing method to perform the needed work. 
This makes the development of new tools faster and easier. Since 
tools directly support the end user in this model, their development 
should be encouraged. 

Defining the Difference between Behaviors and Methods 

One great advantage of the object-oriented design approach is that it 
models users' behavior with methods. There is a clear distinction be- 
tween user-level behaviors and methods. The word behaviors relates to 
how users describe what tools can do for them. For example, "Zoom 
in on this area of a photograph," "Show me this diary," "Display the 
next page of this book," or "Translate this page into French." The 
word methods refers to how system designers describe what tools can 
do for a user. 

One important reason for distinguishing between behaviors and 
methods is to establish a process that will enable libraries to engage 
their users in a dialogue about what services and tools they require, 
down to the behaviors they need in each tool. Software engineers can 
then map the user behaviors into sets of methods that are required to 
perform the necessary functions. The line between behaviors and 
methods represents the transition from user requirements to system 
design. 

The following examples of user-level behaviors might be rele- 
vant an to item in the digital library class "diary": 

• "Show me the organization of this diary." (It may have three vol- 
umes, each of which includes sections on dated entries, accounts, 

and quotes.) 

• "Show me the first page of Volume 1 ." 

• "Show me page 3, the next page, or the previous page." 



2 This digital object model is only conceptual. Complete objects made up of 
metadata, data, and methods do not sit in a repository waiting for use. Instead, 
they are created as needed. That is, the parts of the objects (methods, metadata, 
and content) are assembled from different areas of persistent store located 
anywhere on the network. Using the object-oriented model does not require a 
repository to use specific object technologies such as object-oriented databases. 
Relational databases, for example, could be used for the persistent storage. 



ERIC 



18 



